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ABSTRACT 

Motivation: Biological networks change in response to genetic 
and environmental cues. Changes are reflected in the abundances 
of biomolecules, the composition of protein complexes and other 
descriptors of the biological state. Methods to infer the dynamic state 
of a cell would have great value for understanding how cells change 
over time to accomplish biological goals. 

Results: A new method predicts the dynamic state of protein 
complexes in a cell, with protein expression inferred from 
transcription profile time courses and protein complexes inferred 
by joint analysis of protein co-expression and protein-protein 
interaction maps. Two algorithmic advances are presented: a new 
method, DHAC (Dynamical Hierarchical Agglomerative Clustering), 
for clustering time-evolving networks; and a companion method, 
MATCH-EM, for matching corresponding clusters across time points. 
With link prediction as an objective assessment metric, DHAC 
provides a substantial advance over existing clustering methods. An 
application to the yeast metabolic cycle demonstrates how waves 
of gene expression correspond to individual protein complexes. 
Our results suggest regulatory mechanisms for assembling the 
mitochondrial ribosome and illustrate dynamic changes in the 
components of the nuclear pore. 

Availability: All source code and data are available under the Boost 
Software License as supplementary material, at www.baderzone.org, 
and at sourceforge.net/projects/dhacdist 
Contact: joel.bader@jhu.edu 

Supplementary information: Supplementary data are available at 
Bioinformatics online. 



1 INTRODUCTION 

Current views of biological networks and pathways are primarily 
static, comprising databases of curated pathways or of pairwise 
interactions, primarily between proteins. Many methods have been 
developed to cluster, partition or segment an interaction network into 
putative complexes. Recent comparisons suggest that hierarchical 
stochastic block models provide the most accur ate reconstruction 
of re al protein complexes from interaction data (iPark and Baderl 
l201ll) . These static views, however, fail to capture the rich dynamic 
structure of a cell. Accounting for dynamic changes in protein 
complexes is crucial to building accurate models of cellular state. 

High-throughput measurements of protein abunda nce are possible 
through label-free quantitative mas s spectrometry |Haqqani et all 
I2008I: ISardiu and Washburnl l201ll: IZhu et all l2010l) . but mavbe 
limited to the most highly abundant proteins. In contrast, transcript 
abundance is more readily available and is often used as a proxy for 
protein abundance. Previous methods combined transcript dynamics 
with interaction databases to create a moving picture of the cell state 
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under the crude assumption of a fixed number of protein complexes 
and with ad hoc criteria to match protein complex membership 
across time points (IParkgf allUmdb . The problem considered here 
is also distinct from evolutionary dynamics, where algorithms have 
been developed to estimate anc estral networks and infer the m ost 
likely evolutionary mechanisms dNavlakha and KingsfordlEoill) . 

This work, in contrast, uses a rigorous probabilistic framework 
to translate a hierarchical stochastic block model to the dynamic 
domain. The number of protein complexes can increase or decrease 
at each time point, and individual complexes can grow, shrink, 
or swap components with other complexes. The resulting network 
dynamics reveals the temporal regulation of cell protein state. 

The starting point is our previous static cl ustering method, 
Hiera rchical Agglomerative Clustering, or HAC dPark and Badenl 
l201ll) . The HAC method maximizes the likelihood of a hierarchical 
stochastic block m o del, a lso known as the likelihood modularity 
dBickel and ChenL l2009h . HAC has the appealing features of 
automatic selection of model size and multi- scale networks views. 
Furthermore, it out-performs leading methods in the task of link 
prediction, an objective performance metric when true group 
assignments are unknown. Extending HAC to dynamic networks 
requires a solution to the identifiability problem: how complexes 
inferred at one time point correspond to complexes inferred at other 
time points. Furthermore, transitions of a protein from one complex 
to another must be permitted by the model, requiring dynamical 
coupling between network snapshots. 

In this work we address these points. First, we convert likelihood 
modularity from maximum likelihood to fully Bayesian statistics, 
which automatically accounts for model complexity and provides 
well-founded criteria for selecting the correct number of clusters. 
Second, we 'kernelize' the likelihood modularity with an adaptive 
bandwidth to couple network clusters at nearby time points, 
simila r to methods for regulatory network inference dSong et all 
I2009I) . We term this the Dynamical Hierarchical Agglomerative 
Clustering (DHAC) method. Finally, we solve the general problem 
of matching clusters across time points with a new belief propagation 
method, MATCH-EM, that extends Expect ation-Maximization and 
belief propagation for bipartite matching dBavati et all I2008I) to 
consistently match multiple time-evolving clusters. We apply these 
methods to real biological data to discover the dynamical structure 
of protein complexes. 

2 APPROACH 

2.1 Dynamic network clustering 

Our approach uses stochastic block models, in which interactions 
are conditionally independent given group membership. These 
models can be hierarchical, with larger complexes containing 
sub-complexes with more fine-grained interaction probabilities. 
Networks are observed at specific time points, termed 'snapshots', 
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and the goal is to infer or estimate the time-dependent block model 
given the snapshots. The model itself is generative. While others 
have explored networ ks generated from a pre- specified model 
(ILeskovec et all [20051) , the focus here is on network inference. 

The observed snapshots are a series of T time-ordered graphs, 
{G^:t=\,...,T}. Each single network G^ = (V^,E^) consists 
of undirected and unweighted binary edges and vertices 
yO) Vertices correspond to proteins, and edges represent possible 
protein-protein physical interactions (PPI). For an arbitrary pair, 
t±t' G w and G ( ^ can have different vertices and edses. 
Generalizations to additional vertex classes (transcripts, genes, 
metabolites) and edge types (directed, weighted, epistatic or 
regulatory interactions) follow directly but are not considered here. 

The goal is to infer a corresponding sequence of time-evolving 
stochastic block models, {M^ :t = 1, T}, where each is a 
good network-generative model for Many methods maximize 
the model for each snapshot independently, obtaining M(t) as 
argmax^ P(M | G^), then attempt to stitch together the results. 
Here, we show that introducing explicit coupling between time 
points improves dynamic network clustering. 

3 METHODS 

3.1 DHAC 

A stochastic block model M is a generative model for a network G. The 
number of vertices is V and the number of possible blocks, groups or clusters 
is K. Typically, indices u,v,wel...V denote vertices, and indices i,j,ke 
1...K denote clusters. The notation u e i indicates that vertex u is in cluster 
i, and n{ is the number of vertices assigned to cluster i. 

The probability that a vertex is in cluster k is Ttk, the parameter for the k- 
th cluster in a multinomial distribution, with J2 k 7tk = l. The parameter 0y = 
6ji e [0, 1] gives the probability of an undirected, unweighted edge between 
any pair of vertices u e i and v e j, modeled as independent Bernoulli trials 
for each pair. This model M generates a network G by first sampling the 
membership of each vertex u with probability Ttk for cluster k, then sampling 
each edge e uv = 0 or 1 as a Bernoulli trial with success probability % for u e i 
and v e j. 

Edge counts are summarized at the cluster level as riy = J2 U €i,v€j e uv for i 
h or H u <vei e uv f° r i = j- It is convenient to keep track of the corresponding 
number of non-edges, or 'holes', hij, with eij+hij = tij, the total possible 
number of edges. For ijLj, tij = nitij. For i=j, tu =«/(«/ ±l)/2, with the '+' 
term for graphs with self-edges and the '— ' for graphs without self-edges. 
Using these sufficient statistics, the probability of a network G given the 
structural model and parameters is 

K 



p(g\{6}att},m) = n^n^ 1 - 

k=l i<j 

3.1.1 Maximum likelihood guide tree Vertices are merged into 
increasingly large clusters based on the model likelihood with maximum 
likelihood parameters Ttk=tik/V and §ij = eij/tij. The change in log- 
likelihood upon merging existing clusters 1 and 2 into a new cluster V is 
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where P^ L = e^j'h i j j /t t j J and the superscript S indicates a single snapshot. 
The first term, arising from the multinomial cluster membership model and 
favoring balanced merges, was not included in HAC but is included in 
DHAC. 

3.1.2 Baye sian collapsing cri t erion A Bayes factor selects the model 
complexity iKass and Raftervl Il995l) . Integration of parameter 9 on a 



single Bernoulli likelihood with a uniform prior, or Beta(l,l), results in 
^6 e {\-6) h de = Beta(e + l,/i+l), or e\h\/(e+h+\)\ for integer values. 
Therefore the marginal likelihood is 
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A similar procedure integrating out the nuisance parameters 7ik with an 
uninformative prior would yield the additional contribution n&n^ + 
1)/T(V+K). Alternatively, integrating out the nuisance parameters using a 
strong prior, P({nk}) oc n^ 71 ^ w ^ tn P seu docount v ^> V, yields a contribution 
that is independent of {nk}. The Bayesian likelihood for the edge terms 
provided sufficient collapsing; we did not include the vertex assignment 
term in the Bayesian likelihood, equivalent to a strong prior. The Bayesian 
log-likelihood ratio for collapsing groups 1 and 2 into V is 
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where the superscript B indicates Bayesian, S indicates a single snapshot, 
and Pfj = Beta(e z y + 1 , hy + 1). This score is additive, and summing over all cp 
scores from the bottom clusters (individual vertices) upwards is equivalent 
to the log-likelihood ratio for the model with collapsed versus uncollapsed 
fine structure, with the collapsed vertices being the top-level groups in a 
stochastic block model. The guide tree is collapsed from the bottom up, 
in the order that groups were merged, to identify a local optimum of the 
cumulative 4> score. 

Our initial methods used the Bayesian likelihood for both the greedy guide 
tree and the collapsing step. A problem with this approach is that the Bayesian 
likelihood includes a contr ibution, asympto tically the Bayes Information 
Criterion (BIC) correction iSchwarj. Il978l) . that favors merges of larger 
clusters with different connectivity patterns over merges of smaller clusters 
with identical connectivity patterns. Consequently, using the Bayesian 
likelihood optimized the local Bayes factor but gave a worse global Bayes 
factor than the maximum likelihood approach, which also has less expensive 
function evaluations. We therefore used maximum likelihood for the guide 
tree and Bayesian likelihood for collapsing. 

3.1.3 Kernel-reweighted scores Kernalization of the scores X and (p 
couples nearby snapshots, also providing noise reduction and smoothing. 
Merging and collapsing scores were kernelized using Gaussian Radial 
Basis functions with width parameter r, w(At, r) ocexp{— | At\/r}, where for 
simplicity At is the difference in snapshot indices. The kernelized merging 
score k K (t) and collapsing score <f> K (t) for the t-th snapshot (K denotes 
kernelized) are 
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Although the same clustering is used across all T time points, the scores will 
differ when proteins (or interactions) are present in one time point and absent 
in another. Kernels are normalized as J]I=i w (* — s > T ) = 1- ^ s T 0, k K -> 
k s and (p K -^4> s . Since k s is statistically consistent (see Supplementary 
Material for proof), X K is statistically consistent as nk^oc and t -» 0. 
Collapsing is then performed as for single snapshots, stopping at the 
maximum of the bottom-up sum, termed (p K (t; r)= 5Z(/;)ecoiiapsed^f T )- 
The overall algorithm is summarized in Algorithm 1 . 

In the DHAC-local method, the bandwidth parameter x for snapshot t was 
selected from a grid-search over x values 0.5, 1.0, 1.5,..., 3. 5 to maximize 
(f) K (t\x), with smaller x favored when the network changes quickly. For 
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the network considered here, x ~ 1 to 2 depending on t. Alternatively, a 
constant value of r may be used for all values of t, which we termed DHAC- 
constant. We set t = 1 for DHAC-constant, although in principle r could be 
optimized by maximizing J2 t (f) K (t; t). In practice, results were very robust 
to the value of r, and the performance of DHAC-local was nearly identical 
to DHAC-constant with r = 1 (see Section 4). 



Algorithm 1 DHAC 

forf«-l...rdo 

Set each vertex to be a single cluster 

Let 0 cum <- 0 be cumulative model comparison score [Equation j4j] 
Compute merging scores [Equation J3j] of paris having an edge or one 
or more shared neighbors 
repeat 

Pick a pair ij of maximum Xfj (t; r) 

Update scores of affected pairs after merging i,j 

Merge i,j to i' 

Compute merging scores i'J for all j with eyj > 0 or with 

Update (f) CU m(f, T) ^-0cum + 0f (^; t) 

until no more pairs to merge 

Output group structure M(t; r) at which 0 C um(^ r) was maximum 
end for 



3.2 Cluster matching algorithm 

DHAC-constant and DHAC-local output 7 models, {Mi , . . . ,Mj], and many 
groups will change slowly between time points. The total number of groups 
may differ between time points, however, and even if the number of groups 
and the group membership are nearly identical, group order may be permuted 
across time points. Matching similar groups across time points remains a 
general problem for dynamic networks. 

For T = 2 groups, reasonable yet ad hoc procedures are to match groups 
based on shared members, Jaccard correlation of shared neighbors or 
maximum weighted matching of shared neighbors or other pairwise scores 
jBavati gf fl/ll2008l) . Here, we extend these ideas to multi-partite matching 
based on a novel probabilistic model that introduces some rigor to the time 
course matching problem. 

The goal is to find most probable mapping of cluster i at time t to a globally 
consistent index k. Let z% = 1 if cluster i of snapshot t is assigned to k, and 
0 otherwise, with normalization J2k z fk = Conversely, the sum over local 
clusters, J]/ 4?' * s not & XQ d because the global cluster may be absent at time 
t (sum = 0) or it may be broken into multiple smaller clusters (sum > 1). 

Each cluster i contains original network vertices {u} c V, and counts 
the number of shared members between group i at time t and group j at time 
t+1. The probability that a vertex makes a transition from global state k to 
state k' between two snapshots is fe/, with normalization J2k' ^kk' = 1- For 
simplicity, tykk' is independent of t. When groups do not change over time, 
\(/ kk , =8kk', 1 if k = k' else 0. Similarly, the time-independent parameter v u k 
is the probability that vertex u is in global group k, with J2 k v u k = 1. 

The matching probability under consistent indexing is 
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where S t denotes the set of clusters at snapshot t and Q the set of vertices 
in one of these clusters. 



We solved the maximum a posteriori (MAP) inference problem using 
Expectation-Maximization (EM). The M-step updates are 
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The E-step for Zikit) is more complicated. If the state at time t is represented 
as the assignment matrix {zik(t)}, then the probability structure is a hidden 
Markov model (HMM). This state space is large, however, on the order 
of K K ~K\, because each of the approximately K clusters at time t may be 
assigned to one of K global clusters, and the transition matrix is of order K 2K . 
Instead, we simplify the state space by considering each Zik(t) independently 
and introducing additional couplings that create loops in the corresponding 
graphical model, no longer permitting a dynamic programming solution. 
When groups are stable over time, however, the t opology is close to a tree 
structure and belief propagation (BP) works well iYedidia et <3/.Ll2QQ5h . 

For max-product BP algorithm we reformulate the above Markov Random 
Field, or joint probability [Equation Q], constructing a factor graph 
consisting of factors (hyper-edges) and variables (latent variables). Latent 
variables take on values from 1, . . . ,K, or succinctly [K]. In other words, 
zf ) provides the index k of the global cluster for which z% = 1 . Parameters 
{v} are used to represent singleton factors and {^} pairwise factors. A certain 
latent variable zf^ depends on neighboring pairwise factors N(i,t—1) from 
the previous snapshot and N(i,t + l) from the subsequent snapshot. MAP 
inference is carried out by sending messages from i to j via pairwise factor 
e. The update equations of the message m^ e from variable i at time t to 
factor e and then the message m e ^j from e to variable j at time t+ 1 is 

mi^ e (k)ccY[vuk ]~[ m f ^i(k) (8) 

ueQ f€N(i,t-l)UN(i,t+l)\{e} 
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For variable j at time t — 1, the message m e ^j is 



m e ^j(k) oc max -j / e [K] : yj/^ 1 m^ e {l) 
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The belief b{ of a certain variable i at snapshot t is the product of incoming 
messages, 
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normalized as J2 k bi(k) = l. To prevent the MLEs and BP steps from 
overshooting, parameters and messages were updated as 1/10 of the full 
change, with undate^x^Tiessa£esj^ 

and Friedman* l2009h . We call this EM method MATCH-EM (Algorithm 2). 

3.3 Dynamic network data generation 

Dynamic biological networks were obtained by combining experimental 
gene expression time series data with static protein interaction networks 
to project out the consistent edges, both active (two interacting proteins 
are expressed) and inactive (neither protein is expressed). This method 
assumes that presence of a protein is related to transcriptional abundance 
of the corresponding transcript at a nearby time, with possible delays due 
to translation and protein lifetimes. More realistic models are possible and 
should yield more accurate results (see Section 5). 

Time-series measurements of the expression levels of N genes across 
T time points generates a N x T matrix X . Each element X ut corresponds 
to the expression of gene u at snapshot t. The matrix X is assumed to 



i42 



How networks change with time 



Algorithm 2 MATCH-EM 



Initial greedy matching 
Initialize v and \jr 
repeat 
repeat 

while forward and backward visit of factors do 

Calibrate messages i to j [Equations |8), J9)> QoJ] 
end while 

for each variable i do 

Update belief h[ [Equation ( 1111 1 
end for 
until convergence of BP 

Update latent variables ztk = 1 with k = argmax&;(/) and Zjk' = 0 f° r 



other k'^k. 

Update v, xjr by MLE [Equations <EJ, Q] 
until convergence of EM 



positive count is FP=I{e^ > 6 A eil = 0}, the true negative count is TN = 
I{e ( uv <0a eil = 0}, and the false negative count is FN = /{e$ < 0 A e$ = 1 }• 
A precision-recall curve (PRC) is then created from the precision and recall, 



Precision = : 



TP 



Recall = : 



TP 



TP + FP' TP + FN' 

Similarly, a receiver operating characteristic (ROC) curve is generated from 
the true-positive and false-positive rates, 

TP _ FP 



TPR = 



TP + FN' 



FPR = 



FP+TN 



The dynamic clustering methods are then compared using the area under 
the PRC (AUPRC) and the area under the ROC (AUROC). The AUPRC 
measures the average precision for known positives, which is a useful 
measure for link prediction where classes are skewed with far fewer known 
positives than known negatives. Interaction data are very skewed, with only 
about 1% fraction of positives. The AUROC has the theoretical benefit of 
being invariant to class bias. It is less informative in practice but is included 
here because its use has become standard. 



be preprocess ed and normalize d, here performed with gcrma quantile- 
normalization <Wu et fl/.LEQ0l . Next it is row-standardized to have zero 
mean, J2t X ut = 0, and equal variance, J2t X ut = T — 1, for each gene. 

The dynamics of the network were then inferred from X, under the 
assumpti on that proteins in a complex have correlated gene expression 
profiles Ijansen et all |2002|) . To account for transient complexes and 
cases where delays due to translation and protein lifetime are important, 
correlations were averaged over a bandwidth r, 

T 

X uv (t) = J2 W ^ ~ S > X ^ X us X vs 

5=1 

with the Gaussian kernel function w(Af,T)ocexp(— | Af|/r) and normalized 
to 1. Although this bandwidth x has a similar role to the bandwidth for 
likelihood kernelization, it was not optimized but rather set to 1.5. Results 
were quantitatively similar for r from 1.2 to 2. Smaller values of x result in 
stricter co-expression requirements and result in a sparser network. 

Each edge is then declared present or absent based on the value of X uv (t): 
for each snapshot t = 1, . . . , T, a dynamic edge e uv (t) = 1 if and only if X uv (t) > 
0 and e uv = 1 in static network. This procedure retains edges at time t where 
both proteins are present (X US ,X VS > 0) or both absent (X US ,X VS < 0) for times 
s close to time t. We found that using the negative evidence improved the 
prediction of protein complexes, and that the transcriptional data could then 
be used to identify which complexes or subunits were present or absent at 
each time point. Results were stable for less stringent thresholds, X uv (t)> 
-0.5. While this method is appropriate for periodic processes, other methods 
for extracting time- dependent interactions may be more appropriate for more 
general processes (see Section 5). 

3.4 Performance evaluation 

3.4.1 Held-out link prediction Link prediction accuracy for held-out 
edges provides an objective measure of clustering performance for real- world 
data where the true group structure is fundamentally unknown ( Henderson 
et fl/.. l2010HPark and Badeill201ll) . 

At each time point, we randomly select pairs of vertices (u,v), some 
connected at time t with e uv (t) = 1, and others unconnected with e uv (t) = 0, 
the relative fraction of connected pairs (edges) and unconnected pairs (holes) 
matching the network as a whole. These pairs are then a test set, and the 
remaining edges serve as the training set. After clustering based on the 
training set, vertex u will be assigned to some group i, and vertex v will 
be assigned to group j. The maximum likelihood probability of the (u,v) 
edge, denoted e$, is then ef = ef/(ef +hf). 

Varying a threshold 9 for e^l, or in practice ranking pairs in decreasing 
order of e®, the true positive count is TP >9 A 45 = 1}> the false 



3.4.2 Competing methods We compared the following algorithms: 
DHAC-constant, dynamic clustering with a constant fixed bandwidth (t = 1 
for the link prediction experiments); DHAC-local, bandwidths adaptively 
optimized for each snapshots (r =0.5, 1.0, 1.5,.. .,3); HAC, DHAC with 
bandwith x = 0, simi lar to HAC-ML method b ut using bottom-level clusters 
for link pre diction IPark and Badeit l201ll) ; and CNM, fast modularity 
optimization jClauset g ^I[[20 04|). We initially considered Variational Bayes 
Modularity (VBM jHofman and Wigginsll2008l) but did not include it because 
it is slower and is often trapped in bad local optima. Initially we attempted to 
model network dynamics with a Marko v model, similar to a M arkov chain of 
static exponential random graph model lHanneke <?f <3/.ll20ld) . We found that 
kernelizat ion, used previousl y in the KELLER algorithm for transcriptional 
networks <Song et fl/.Ll2009h . provides better performance. In contrast, the 
Markov chain approach performed worse than DHAC and only slightly better 
than HAC (results not shown) and is not included in the comparison. For 
more extensive comparison of other static clustering methods we refer to 
our previous studies, which identified HAC-ML as the best-performing link- 
prediction method for larg e networks, including c luster-free link prediction 
by graph diffusion kernels jPark and Baderll201ll) . 



4 RESULTS 

4.1 Link prediction performance 

4.1.1 Drosophila networks As a proof of concept we first tested 
our algorithm on a dynamic network for Drosophila development, 
for which a gene expression time course is available ( Arbeitman 
et a/.. l2002h . Rather than analyzing the expression data directly, 
we relied on previous analysis using KELLER to identify time- 
varying regulatory interactions between g enes, yielding a n etwork 
with 66 time points and 588 gene vertices (ISong et a/.Ll2009h . Thus, 
genes u and v are connected at time t e 1 ... 66 according to these 
previous results, defining a sparse time-varying network with mean 
vertex degree ~6.5. Since gene interactions were generated with 
time smoothing, DHAC-constant and DHAC-local are expected to 
outperform static methods. 

In extensively cross-validated link prediction performance, 
DHAC-constant and DHAC-local are seen to be far superior to the 
next-best method, HAC, which in turn dominates CNM until ~30% 
of the true edges are removed (Fig.[T|). To perform these studies, from 
5% to 80% of the known edges were removed; results were averaged 
over the 66 time points; and the entire procedure was repeated more 
than 10 times. 
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Fig. 1. Link prediction results for Drosophila networks. (A) average AUPRC 
scores for different methods (y-axis) along different missing link ratios (x- 
axis); (B) AUROC scores for different methods (y-axis) along different 
missing link ratios (x-axis). Points and lines: averge time-cumulative 
performance; shaded area: 1 -standard error. See Section 3 for details 

The similar results of DHAC-constant and DHAC-local point to 
robust behavior with respect to the kernelization parameter r. The 
improved performance of CNM relative to HAC at a high frequency 
of missing links may be due to the tendency of CNM to generate 
large clusters and to lose resolution. The resolution limit is usually 
a drawback, but here is beneficial for link prediction in a sparsified 
network. Even in this limit, however, DHAC remains superior by 
drawing information from adjacent timepoints. 

4.1.2 Yeast Metabolic Cycle networks We then tested link 
prediction accuracy on Yeast Metabolic Cycle (YMC) networks. 
The YMC networks started with a large-scale protein-protein 
interaction dataset (B ioGrid 3.1.81 with 63 410 physical interactions 
and 4342 proteins, I Stark et all 120061) . Requiring support by 
two or more publications, a criterion used previously by others 
dBandvopadhvav et q/lboioh . retained 13 401 interactions and 3248 
proteins. This physical network was combined with data from YMC 
gene expression microarrays over 36 time points showing 3510 
significantly periodic genes, of which 2979 occur in the physical 
network etal 1 120051) . We retained edg es e\^l that connected 
periodic genes and were observed for at least two values of t. 
Snapshots on average contained 1380 proteins with degree 1.8, 
sparser than the Drosophila network. The union over all snapshots 
contained 1575 proteins. 

As before, DHAC methods clearly outperform static clustering 
methods for link prediction on YMC data (Fig. [2]). The performance 
of DHAC-local is slightly better than DHAC-constant. HAC 
performs better than CNM for AUPRC, but CNM performs better 
for AUROC. This follows the trend seen with the Drosophila data, 
where resolution loss improves the relative performance of CNM 
for sparse networks. 

4.1.3 Static versus dynamic edge removal The link prediction 
results described above used a protocol in which the held-out edges 
were resampled for each snapshot. Thus, noise is uncorrected 
between snapshots, and dynamic smoothing suppresses the noise 
by time- averaging because the full gold standard changes slowly 
(due to kernelization with KELLER) or not at all (with the yeast 
physical interaction network). We therefore tested an alternative 
link prediction scheme for the YMC data in which the held-out 



Fig. 2. Link prediction results for YMC networks. (A) average AUPRC 
scores for different methods (y-axis) along different missing link ratios 
(x-axis); (B) AUROC scores for different methods (y-axis) along different 
missing link ratios (x-axis). Points and lines: average performance; shaded 
area: 1 -standard error. See Section 3 for details 

edges are systematically removed across all snapshots, eliminating 
the advantage of time averaging. In this case, dynamic smoothing is 
not expected to help, and the performance of DHAC indeed fell to 
the performance of the static HAC algorithm (results not shown). 

4.2 YMC dynamics 

YMC transcriptional profiling reveals three dominant metabolic 
states: reductive building (RB, 977 genes); reduct ive charging (RC , 
1510 genes); and oxidative (OX, 1023 genes) (iTu et all Eooi) . 
Almost a half of total genes oscillate along this cycle, indicating 
that a broad swath of processes are involved but making it difficult 
to extract specific dynamical modules from expression data alone. 

Prior to clustering, the network used for link prediction was 
made less sparse by applying an iterative degree cutoff (>3). 
Combining with the 36 time-varying snapshots, 3 complete cycles 
of 12 snapshots each, reduced the size of the network from 1380 
proteins per snapshot to 480±14 and increased the mean vertex 
degree from 1.8 to 6.6. Networks were clustered by DHAC-local. 
Clusters were matched across time points using MATCH-EM to 
yield 31 complexes with a total of 613 proteins. 

We checked robustness using a bootstrapping procedure in which 
a fraction a of edges are randomly rewired according to the degree- 
consistent configurational model (iKarrer et all 120081) . We used 
a =0.01 and performed 500 bootstraps, with ~80% co-membership 
conserved across bootstraps at each snapshot. 

4.2.1 Macro-view of YMC complexes We recovered 3 1 dynamic 
complexes with at least 3 proteins and bootstrap co-membership 
~80% (Fig. [3]). Many of the complexes have cluster- specific gene 
ontology (GO) keywords with P-value <0.05. Organizing clusters 
by average gene expression at each time point separates those that 
are active in each phase. RB clusters, #1 to #10, are related to cell 
cycle checkpoints and mitochondrial translation. OX clusters, #11 
to #20, include ribosome metabolism, DNA replication/repair, and 
translation. RC clusters, #23 to #31, include stress response and 
transport. 

Most of the complexes can be matched across the entire time 
course, but some disappear then reappear. An example is complex 
#4, annotated for DNA repair that is most active at the end of each 
12-point cycle. This behavior required the MATCH-EM algorithm 
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Fig. 3. Dynamic network clustering reveals a detailed global view of periodic protein complexes during the yeast metabolic cycle. Squared nodes represent 
clusters matched across time points, showing only clusters having at least three genes/proteins. Cluster order: clusters are organized by peak activity in RB 
phase (#1 to #10), OX phase (#11 to #20) and RC phase (#23 to #31). (Color code of cluster index: predicted clusters were matched with known complexes (see 
the text); cluster indexes are differently color-coded by Jaccard correlation; yellow for a good match (Jaccard correlation >80%), green for moderate similarity 
(correlation >20% and <80%), and gray for poor overlap (correlation <20%)). Node size: number of genes/proteins contained in this cluster. Node color: 
average standardized gene expression level at time t. Edge width: Jaccard coefficient (or coherence) between clusters of adjacent snapshots. Gene Ontology: 
cluster- specific GO keywords were identified by hypergeometric tests. The right panel shows the top three enriched GO categories with P-value <0.05 



for globally consistent clusters, and would have been impossible to 
resolve given matching to nearest neighbors alone. 

We ascertained whether the complexes predicted by our methods 
correspond to known complexes obtained from manual curation, 
CY C2008. or from high- throughput experiments. YHTP2008 (T u 
et flL l2009h . The 408 manual and 400 high-throughput complexes 
were filtered to retain the periodic proteins from YMC data, and then 
the catalog complex with the best Jaccard correlation was identified 
for each predicted complex. Of the 31 predicted complexes, 14 are 
poorly represented in the catalogs (Jaccard correlation <20%), 11 
are only moderately similar (correlation >20% and <80%) and 6 
have a good match (correlation >80%). The predicted complexes 
with poor overlap often recombine subunits from multiple catalog 
complexes (see #16 below). 

To test the effects of the filtering, we also performed clustering 
using all 63 410 BioGrid interactions and including all genes 
with YMC data, periodic or non-periodic, yielding a network 
of 54758 interactions among 4987 proteins. Clustering this 
network and retaining complexes with at least three proteins and 
edge density >0.1 yields 20^0 clusters at each snapshot with 
900±100 proteins included. Most clusters in the unfiltered network 
contains a high-degree core from the filtered network. Occasionally 
multiple cores are combined by low-degree connections, making 
the cluster count smaller than in the filtered network. The 
overlap with protein complex catalogs is similar to the unfiltered 
network. 



4.2.2 Micro-views of YMC dynamics The protein complex 
dynamics provide a rich view of YMC providing new biological 
insight, as demonstrated by in depth analysis of clusters #7, the 
mitochondrial ribosome and cluster #16, the nuclear pore. 

Mitochondrial ribosome complex (#7) The mitochondrial ribosome 
is generally assumed to be RB-specific, with transcription switched 
on briefly at the transition from OX to RB (Fig. [4}. This 
complex contains primarily RSMs (ribosomal small subunit of 
mitochondrias) and MRPs (mitochondrial ribo somal proteins) , 
known components of the mitochondrial ribosome JSaveanulEoOlt) . 

Underneath this general pattern, however, RSM22 shows 
systematic expression ahead of other components. At time 
points t = 9, t = 20, and t = 32, RSM22 is active whereas other 
proteins are not transcribed. RSM22 is a nuclear-encoded putative 
S-ade nosvlmethionine (SAM) methvltransferase ( Petrossian and 
Clarke. 120091) . and methvlation of the 3 r -end of the rRNA of the 
small mitochondrial subunit is re quired for the assembly a nd stability 
of the mitochondrial ribosome dMetodiev et al lHooi). Deleting 
RSM22 yields a viable cell with non-functional mitochondria. 
Together, these results suggest the hypothesis that early expression of 
RSM22 may provide the methylation activity necessary for assembly 
of the mitochondrial ribosome. 

Nuclear pore complex (#16) Most genes in the nuclear pore 
complex are OX-responsive and the complex is most active at 
t = 9, 20, 32 (Fig. [5}. Unlike the mitochondrial ribosome, where 
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Fig. 4. Cluster #7, mitochondrial ribosome. Top: cluster members for the 36 gene expression snapshots. Bottom: Average expression for the three YMC 
phases. Node color: standardized gene expression level. Gene names were colored red or blue if expression values are above 0.5 or —0.5 respectively 
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Fig. 5. Cluster #16, nuclear pore complex. Top: cluster members for the 36 gene expression snapshots. Bottom: Average expression for the three YMC phases. 
Node color: standardized gene expression level. Gene names were colored red or blue if expression values are above 0.5 or —0.5 respectively 



the entire complex is generally transcribed in synchrony, this 
complex shows a smaller co-expressed core that is complemented 
with transient members during the OX phase. While it combines 
subunits of several annotated complexes, it has poor overlap with 
any single complex. Its best overlap is a 15% Jaccard correlation 
with high-throughput complex CID15 from YHTP2008. 

The co-expressed core incl udes nuclear pore comp lex (NPC) and 
Karyopherin (KAP) proteins dPemberton q/.lll998l; Strambio-De- 
Castillia et al l2Q10h . The physical structure of the NPC comprises 



mostly NUP proteins. Among the proteins included in cluster 
#16, NUP 2, NUP100 and NUP116 shape the Phe-Gly passage of 
the NPC dStrambio-De-Castillia eFali l2010h . In contrast, KAP 
proteins are not considered stru ctural but rather med iate export 
and import of R NA and proteins dGriinwald et a/.[|201ll; Strambio- 
De-Castillia et a/.. l2010h. KAP123 and PSE 1 specifically transport 
ribosomal proteins dSchlenstedt et a/.L[l997h . During the OX phases, 
SRP1 and SXM1 are additionally recruited. These KAP proteins 
recognize either nuclear localization sequences (NLS) or nuclear 
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export sequences (NES) and direct transport into or out of nucleus 
JPemberton et a/.LIl998h . 

Other transient memberships suggest additional hypotheses. 
RRP4 and RRP42 are a part of the e xosome that edits RNA 
molecules 3 f -> 5 r dlVlitchell et all Il997h . Our clustering predicts 
that these proteins transition between the nuclear pore and other 
complexes during the cycle. CSL4 was recently r eported to i nterac t 
with RNA and is a possible exosome component ihiu et q/.Ll2006h . 
LHP1 is a La protein that binds to RNA polymerase III transcripts 
and small ribonuclear proteins (snRNPs), working as a molecular 
chape rone to protect and terminate the 3'-end of transcripts (T oo and 
Wolin. ll994l) . These results are consistent with the hypothesis that 
RNA processing is tig htly coupled to transport through th e nuclear 
pore to the cytoplasm dStrambio-De-Castilha et alluOld) . but also 
suggest that dynamic reorganization of the nuclear pore occurs 
during the metabolic cycle. Additional evidence is the appearance 
of a second expression peak involving a subset of nuclear pore 
components at the start of the RB phase, which has not been 
previously described. 

5 DISCUSSION 

Dynamic network clustering is an increasingly important problem 
across diverse disciplines. Our algorithm optim izes the likelihood 
modu larity, which is asymptotically consistent dBickel and Chenl 
l2009h . Other machine learning and physics approaches are based on 
probab ilistic graphical models such as Latent Dirichlet Allocatio n 
(LDA, lAiroldi et all 120081 : iBall et all |201ll: iBlei et all 120031) . 
Dynamic extensions have been proposed (iFu et a/.Ll2009h . but prior 
to our work have been impractical except for very small networks 
with around 100 vertices and under 1 0 latent classes. Even efficien t 
variational methods such as VBM (iHofman and WigginsL 120081) 
have scaling that is far worse than a near linear or at least quadratic 
run time in the number of nodes and edges. 

Our DHAC algorith m scales as 0(£71nV), the same as HAC 
jPark and BadeAEoiH) , with a constant pref actor for the number of 
time points. This provides an excellent trade-off for genome-scale 
problems. Networks considered here with 2000 vertices required 
about 5 min on a single 2 GHz processor. A full-genome network 
with 10 000 to 100 000 vertices could be analyzed in a day to a week 
on single processor, but in practice would be much faster because 
each time point could be run in parallel. 

The cluster matching algorithm MATCH-EM is a second 
contribution that provides a solution to the general problem of 
tracing the evolution of a set of groups or clusters over time. 
It general izes a previous be lief propagation method for bipartite 
matching feavati et a/.Ll2006h . The bipartite max-product algorithm 
is exact with a worst-case run-time of 0(X 3 ) for K classes. Our 
generalization has an additional linear factor of the number of time 
points. While it is no longer guaranteed to converge to the exact 
solution, for biological networks here it converges rapidly with good 
results. 

Our methods applied to real biological data provide new insight. 
Many transcription time course experiments reveal waves of 
correlated gene expression, with no standard methods to parse a 
large set of correlated genes into well-defined protein complexes. 
The DHAC method is a general solution to this problem and 
provides a multi-resolution view of dynamic expression and 
organization of the proteome. Focusing on specific predicted 



complexes reveals possible mechanisms of regulation and control. 
Our analysis of the yeast metabolic cycle identifies protein 
complexes with asynchronous gene expression, which suggests 
RSM22 as an RNA methyltransferase whose early expression may 
be required to assemble and stabilize the mitochondrial ribosome. 

Our methods permit proteins to switch between complexes over 
time, which we see in the dynamics of the nuclear pore. Hierarchical 
methods like DHAC also provide a natural multi- scale description 
of complexes, subcomplexes and proteins. A separate challenge is 
introducing mixed membership, with the s ame protein servin g as a 
subunit in two distinct protein complexes dPalla et a/.Ll2Q05h . 

Several improvements to DHAC are possible. Previous work 
showed that the hierarchical structure inferred from static networks 
corresponds to levels of biological organization, pathway to 
complex to subcomplex and the fine structure underneath a 
collapsed complex can also be used to improve link prediction 
dPark and Badel l201ll) . In the current work, however, we lacked 
a method to match the dynamically evolving hierarchical structure 
across snapshots. Consequently the focus here is on the bottom-level 
clusters rather than the hierarchical structure. 

This work assumes that the population average transcription 
data is a good representation for the transcriptional state of each 
cell. In reality, individual cells may differ from the mean. In the 
yeast metabolic cycle, for example, about half of the cells undergo 
cell division per metabolic cycle, potentially yielding two distinct 
cell populations. More advanced methods have been proposed to 
incre ase resolution and drive toward single-cell models ( B ay m 
et fl/.. l2008l) . " — — 

Direct measurements of protein abundance through quantitative 
mass spectrometry could improve the analysis and would be 
intriguing to combine with expression data. For transcription data, 
protein abundance may be better estimated by a transcription- 
translation model, P(t) = fiR(t)-aP(t), where R(t) is the 
measured transcriptional abundance, P(t) is the abundance of the 
corresponding protein and and a are production and degradation 
rates. This model generates exponentially weighted smoothing of 
protein abundance, similar to the exponential kernel we used for 
smoothing. Since the exponential smoothing kernel already works 
well, we anticipate that results should be robust to choices of /3 and 
a, with the possibility of using consensus values for most proteins. 
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